System and method for identification of multimedia content elements

ABSTRACT

A system and method for identifying multimedia content elements are presented. The method includes generating a signature for an unknown multimedia content element (MMCE), wherein the unknown MMCE is a portion of a multimedia content item; comparing a signature of the unknown MMCE to a signature cluster of at least one concept to determine whether each of the at least one concepts is proximate to the unknown MMCE; for each proximate concept, determining a probability that the at least one proximate concept identifies the unknown MMCE; and identifying the MMCE based on the determined probabilities of the at least one proximate concept.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. 14/700,809 filing date Apr. 30, 2015 which claims priority from US provisional patent 61/986,241 filing date Apr. 20, 2014.

TECHNICAL FIELD

The present disclosure relates generally to analysis of multimedia content, and more specifically to methods for recognizing elements appearing in multimedia content items using probabilistic models.

BACKGROUND

Identification of multimedia content elements shown in multimedia content is a challenging problem with many practical applications. Currently, available identification systems are mainly used in order to recognize such content. However, prior art systems are very limited in their ability to identify multimedia content elements that have been received for the first time. In particular, such systems typically focus on determining whether received multimedia content elements match known multimedia content elements. As a result, unknown multimedia content elements (such as those that have been received for the first time) are usually not recognized.

Furthermore, existing solutions are highly sensitive to changes in the received multimedia content elements. Occasionally, a small change in the way a multimedia content element was captured will make its identification much more complex and inaccurate. As an example, an identification system may determine whether a received image of a car includes a particular make and model of car based on determining whether the received image matches a known image of a car of that make and model. Existing identification systems often fail to properly identify the make and model of the car when, e.g., the received image of a car shows the car at a significantly different angle (e.g., from the rear right side) than an angle of the known image of the car (e.g., from the front left side).

Other solutions for the identification of multimedia content elements are based on metadata describing the content. However, metadata are often adequately defined in words that are needed to fully describe the multimedia content (e.g., pictures or video). For example, it may be desirable to locate a car of a particular model in a large database of video clips or segments. In some cases, the model of the car would be part of the metadata, but in many cases it would not. Similarly, if a piece of music, as in a sequence of notes, is to be identified, it is not necessarily the case that in all available content the notes are known in their metadata form, or for that matter, the search pattern may just be a brief audio clip.

It would therefore be advantageous to provide an efficient and accurate solution for identifying multimedia content elements.

SUMMARY

A summary of several example embodiments of the disclosure follows. This summary is provided for the convenience of the reader to provide a basic understanding of such embodiments and does not wholly define the breadth of the disclosure. This summary is not an extensive overview of all contemplated embodiments, and is intended to neither identify key or critical elements of all aspects nor to delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more embodiments in a simplified form as a prelude to the more detailed description that is presented later. For convenience, the term “some embodiments” may be used herein to refer to a single embodiment or multiple embodiments of the disclosure.

Certain embodiments disclosed herein include a method for identifying multimedia content elements. The method comprises generating a signature for an unknown multimedia content element (MMCE), wherein the unknown MMCE is a portion of a multimedia content item; comparing a signature of the unknown MMCE to a signature cluster of at least one concept to determine whether each of the at least one concepts is proximate to the unknown MMCE; for each proximate concept, determining a probability that the at least one proximate concept identifies the unknown MMCE; and identifying the MMCE based on the determined probabilities of the at least one proximate concept.

Certain embodiments disclosed herein include a method for identifying multimedia content elements. The method comprises generating a signature for an unknown multimedia content element (MMCE), wherein the unknown MMCE is at least a portion of a multimedia content item; determining whether the unknown MMCE is identifiable by the signature; upon determining that the unknown MMCE is identified by the signature, identifying the MMCE based on the signature; and upon determining that the MMCE is not identified by the signature: comparing a signature of the unknown MMCE to a signature cluster of at least one concept to determine whether each concept is proximate to the unknown MMCE; for each proximate concept, determining a probability that the at least one proximate concept identifies the MMCE; and identifying the MMCE based on the determined probabilities of the at least one proximate concept.

Certain embodiments disclosed herein include a system for identifying multimedia content elements. The system comprises a processing unit; a memory connected to the processing unit, the memory containing instructions that when executed by the processing unit, configure the system to: generate a signature for an unknown multimedia content element (MMCE), wherein the unknown MMCE is a portion of a multimedia content item; compare a signature of the unknown MMCE to a signature cluster of at least one concept to determine whether each of the at least one concepts is proximate to the unknown MMCE; for each proximate concept, determine a probability that the at least one proximate concept identifies the unknown MMCE; and identify the MMCE based on the determined probabilities of the at least one proximate concept.

Certain embodiments disclosed herein include a system for identifying multimedia content elements. The system comprises a processing unit; a memory connected to the processing unit, the memory containing instructions that when executed by the processing unit, configure the system to: generate a signature for an unknown multimedia content element (MMCE), wherein the unknown MMCE is at least a portion of a multimedia content item; determine whether the unknown MMCE is identifiable by the signature; upon determining that the unknown MMCE is identified by the signature, identify the MMCE based on the signature; and upon determining that the MMCE is not identified by the signature: compare a signature of the unknown MMCE to a signature cluster of at least one concept to determine whether each concept is proximate to the unknown MMCE; for each proximate concept, determine a probability that the at least one proximate concept identifies the MMCE; and identify the MMCE based on the determined probabilities of the at least one proximate concept.

BRIEF DESCRIPTION OF THE DRAWINGS

The subject matter disclosed herein is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the disclosed embodiments will be apparent from the following detailed description taken in conjunction with the accompanying drawings.

FIG. 1 is a diagram of a network system utilized to describe the various disclosed embodiments;

FIG. 2 is a flowchart illustrating a process for identifying multimedia content elements shown in multimedia content items according to an embodiment;

FIG. 3 is a flowchart illustrating a method for determining which concepts have a high probability of accurately identifying a multimedia content element according to an embodiment;

FIG. 4 is a block diagram illustrating the basic flow of information in a signature generator system; and

FIG. 5 is a diagram illustrating the flow of patches generation, response vector generation, and signature generation in a large-scale speech-to-text system.

DETAILED DESCRIPTION

It is important to note that the embodiments disclosed herein are only examples of the many advantageous uses of the innovative teachings herein. In general, statements made in the specification of the present application do not necessarily limit any of the various claimed embodiments. Moreover, some statements may apply to some inventive features but not to others. In general, unless otherwise indicated, singular elements may be in plural and vice versa with no loss of generality. In the drawings, like numerals refer to like parts through several views.

FIG. 1 shows an exemplary and non-limiting diagram of a network system 100 utilized to describe the various disclosed embodiments. A network 110 is used to communicate between different parts of the network system 100. The network 110 may be the Internet, the world-wide-web (WWW), a local area network (LAN), a wide area network (WAN), a metro area network (MAN), and other networks capable of enabling communication between the elements of the system 100.

Further connected to the network 110 is a user device 120 configured to execute at least one application 125. The application 125 may be, for example, a web browser, a script, a mobile or native application (“app”), a web application, or any application programmed to interact or communicate with a server 130. The user device 120 may be, but is not limited to, a personal computer (PC), a personal digital assistant (PDA), a mobile phone, a smart phone, a tablet computer, a laptop, a wearable computing device, or another kind of computing device equipped with browsing, viewing and managing capabilities that is enabled as further discussed herein below. It should be noted that only one user device 120 and one application 125 are illustrated in FIG. 1 only for the sake of simplicity and without limitation on the generality of the disclosed embodiments.

The network system 100 also includes a data warehouse 160 configured to store multimedia content items, multimedia content elements, concepts, concept structures, and the like. In the embodiment illustrated in FIG. 1, the server 130 communicates with the data warehouse 160 through the network 110. In other non-limiting embodiments, the server 130 is directly connected to the data warehouse 160.

The various embodiments disclosed herein are realized using the server 130, a signature generator system (SGS) 140, and a deep-content-classification (DCC) system 150. The SGS 140 and/or the DCC system 150 may be connected to the server 130 directly or through the network 110. The server 130 is configured to receive and serve the at least one multimedia content item in which multimedia content elements to be identified are included. The multimedia content item may be, but is not limited to, an image, a graphic, a video stream, a video clip, a video frame, a photograph, an audio clip, and/or combinations thereof and portions thereof.

In one embodiment, the server 130 is configured to receive a URL of a web-page viewed by the user device 120 and accessed by the application 125. The web-page is processed by the server 130 to extract the multimedia content item contained therein. The request to analyze the multimedia content item can be sent by a script executed in the web-page such as the application 125 (e.g., a web server or a publisher server) when requested to upload one or more multimedia content items to the web-page. Such a request may include a URL of the web-page or a copy of the web-page. The application 125 can also send a picture or a video clip taken by a user of the user device 120 to the server 130.

In an embodiment, the SGS 140 is configured to generate at least one signature for each of the multimedia content elements. To this end, according to some disclosed embodiments, the received element is partitioned into a plurality of elements of content. For each such element, at least one signature is generated. As an example, an image (multimedia content element) showing a smiling child with a Ferris wheel in the background is received. The content elements of the image may include the Ferris wheel and the child. Signatures may be generated for the image respective of the Ferris wheel and of the child using the process of signature generation disclosed in greater detail below with respect to FIGS. 4 and 5.

In an embodiment, the DCC system 150 is queried by the server 130 to identify the multimedia content elements respective of their signatures. A multimedia content element may be identified respective of its signature by, e.g., identifying a multimedia content element that is identical to the other multimedia content element. Upon determination that at least one multimedia content element is not identified, the DCC system 150 is queried again by the server 130 to find at least one concept that matches one or more of the multimedia content elements that were not identified. Initially attempting to identify multimedia content elements based on their signatures advantageously may reduce consumption of computing resources by only requiring determination of probabilities upon determining that at least one multimedia content element is unidentified.

It should be noted that the server 130 typically comprises a processing unit and a memory (not shown). The processing unit is coupled to the memory, which is configured to contain instructions that can be executed by the processing unit. The server also includes a network interface to the network.

In an embodiment, the processing unit may comprise, or be a component of, a larger processing unit implemented with one or more processors. The one or more processors may be implemented with any combination of general-purpose microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate array (FPGAs), programmable logic devices (PLDs), controllers, state machines, gated logic, discrete hardware components, dedicated hardware finite state machines, or any other suitable entities that can perform calculations or other manipulations of information.

A concept is a collection of signatures representing a multimedia element and metadata describing the concept. The collection is a signature reduced cluster generated by inter-matching of the signatures generated for many multimedia content elements, clustering the inter-matched signatures, and providing a reduced cluster set of such clusters. As a non-limiting example, a ‘Superman concept’ is a signature reduced cluster of signatures describing elements (such as multimedia elements) related to, e.g., a Superman cartoon: a set of metadata including textual representations of the Superman concept.

Techniques for generating concepts and concept structures are also described in the U.S. Pat. No. 8,266,185 (hereinafter the '185 Patent) to Raichelgauz, et al., which is assigned to a common assignee, and is incorporated by reference herein for all that it contains. In an embodiment, the DCC system 150 is configured and operates as the DCC system discussed in the '185 patent.

A concept may match an unidentified multimedia content element if a signature cluster of the concept matches a signature of the unidentified multimedia content element above a predefined matching threshold. A signature cluster and a signature may be matched as two signatures are matched. Signature matching is described further herein below with respect to FIGS. 4 and 5. The server 130 extracts at least one concept in proximity to one or more of the unidentified multimedia content elements. The extraction may be from a data warehouse, for example the data warehouse 160.

A concept is in proximity to a multimedia content element if a signature cluster associated with the concept matches one or more signatures of the other concept above a predefined proximity threshold. As a non-limiting example, a concept may be proximate to a multimedia content element if a clustered signature of the concept matches a signature of the multimedia content element above 10%.

The server 130 determines the probability that at least one portion of an extracted concept matches each of the at least one unidentified multimedia content elements. In an embodiment, the probability may be equal to a signature matching score of the signature cluster of the proximate concept to the unidentified multimedia content element. As a non-limiting example, if a signature matching score of 20% is determined via signature matching between the signature cluster and the signature, the probability may be determined to be 20%.

As a non-limiting example of determining probabilities, an image (multimedia content element) showing a pencil and a highlighter is analyzed. The pencil is identified based on a known concept of pencils. The highlighter remains unidentified. The signature of the highlighter is compared to various concept structures. The comparison yields a signature matching score of 30%. It is determined that the signature of the highlighter matches the concept of pencils above the predefined proximity threshold of 25%. Accordingly, the concept of pencils is determined to be proximate to the highlighter multimedia content element, and the probability is determined to be 30%.

In an embodiment, the extracted concept that is determined to have the highest probability may be determined to identify the multimedia content element. This may be particularly important when, e.g., multiple concepts are determined to be proximate to the multimedia content element. In an optional embodiment, the highest probability is compared to an identification threshold. The identification threshold is typically a percentage or decimal value representing a minimal acceptable probability. In that embodiment, if the highest probability among extracted concepts is below the identification threshold, the multimedia content element may be identified as unknown.

It should be noted that each of the server 130, the SGS 140, and the DCC system 150 typically includes a processing unit, such as a processor (not shown) or an array of processors, coupled to a memory. In one embodiment, the processing unit may be realized through architecture of computational cores described in detail below. The memory contains instructions that can be executed by the processing unit. The server 130 also includes an interface (not shown) to the network 110.

In some non-limiting embodiments, the operation of the server 130 is executed locally on the user device 120. In an embodiment, the user device 120 comprises at least a partial DCC system (not shown in FIG. 1 as part of device 120) as well as a partial signature generation system (not shown in FIG. 1 as part of device 120).

FIG. 2 depicts an exemplary and non-limiting flowchart 200 describing a method for identifying multimedia content elements shown in a multimedia content item according to an embodiment. In an embodiment, the method may be performed by a server (e.g., the server 130) or by a user device (e.g., the user device 120).

In S210, a multimedia content item having at least one multimedia content element is received. In an embodiment, the multimedia content item is received together with a request to identify the multimedia content elements shown in the multimedia content item.

In S220, at least one signature is generated respective of each multimedia content element. In an embodiment, signatures may be generated by the SGS 140 as described in greater detail herein below with respect to FIGS. 4 and 5.

In S230, a DCC system (e.g., the DCC system 150) is queried to identify each of the multimedia content elements. According to one embodiment, the identification may be made through a data warehouse (e.g., the data warehouse 160). According to another embodiment, the identification is made through one or more data sources accessible over a network (e.g., the network 110).

In S240, it is checked whether all multimedia content elements were identified and, if so, execution continues with S280; otherwise, execution continues with S250. In S250, the DCC system is queried to find at least one concept that is proximate to each of the unidentified multimedia content elements. Concepts and proximity to multimedia content elements are described further herein above with respect to FIG. 1.

In S260, at least one concept in proximity to the at least one unidentified multimedia content element is extracted from, for example, the data warehouse 160. A concept is in proximity to a multimedia content element if a signature cluster associated with the concept matches one or more signatures of the other concept above a predefined proximity threshold. As a non-limiting example, a concept may be proximate to a multimedia content element if a clustered signature of the concept has a matching score with a signature of the multimedia content element that is above 10%.

In S270, the probability that each extracted concept matches each of the one or more unidentified multimedia content elements is determined. Determining probabilities that concepts match unidentified multimedia content elements is described further herein above with respect to FIG. 1.

In S280, it is checked whether additional multimedia content items have been received and, if so, execution continues with S210; otherwise, execution terminates.

FIG. 3 is an exemplary and non-limiting flowchart 300 illustrating a method for determining which concepts have a high probability of accurately identifying a multimedia content element according to an embodiment. In S310, a request to determine which concepts have a high probability of accurately identifying a multimedia content element (MMCE) is received. In an embodiment, the request may further include the concepts and the multimedia content element.

In S320, a signature is generated for the multimedia content element. In S330, the generated signature is compared to a cluster of signatures of each concept to determine whether each concept is in proximity to the multimedia content element. Signature matching is described further herein below with respect to FIGS. 4 and 5.

In S340, a probability that a portion of each proximate concept represents the multimedia content element is determined. In an embodiment, the probability is equal to a matching score between the concept and the unidentified multimedia content element. Determination of probabilities is described further herein above with respect to FIG. 1.

In S350, the concept having the highest probability among proximate concepts is determined to identify the multimedia content element. In an embodiment, more than one concept having higher probabilities than the other concepts may be identified. As a non-limiting example, a set number of concepts (e.g., 3 concepts) having the highest probabilities among concepts may be identified. In that example, among concepts with probabilities of 0.3, 0.4, 0.55, 0.67, and 0.99, respectively, the concepts with probabilities of 0.55, 0.67, and 0.99 may be identified.

In another embodiment, if no concept is proximate to the unidentified multimedia content element, that multimedia content element may be identified as unknown. As a non-limiting example, if the proximity threshold is 0.6, a multimedia content element with a matching score of 0.5 to just one concept structure will be identified as unknown.

As a non-limiting example, a request to determine which concepts have a high probability of accurately identifying a multimedia content element is received. The request contains a multimedia content element featuring a dog and a dog chew toy as well as a concept representing dog chew toys. A signature is generated for the multimedia content element of the dog and for the dog chew toy. The generated signatures are compared to the cluster of signatures of the dog chew toy concept. Based on this comparison, it is determined that the dog chew toy concept identifies the dog chew toy multimedia content element and that the dog multimedia content element is unidentified. A signature of the dog multimedia content element is compared to a signature cluster of the dog chew toy concept, thereby yielding a matching score of 40%, which is above a proximity threshold of 15%. As a result, the probability is determined to be 40%, and the multimedia content element featuring a dog is identified as being a dog.

As another non-limiting example, a request to determine which concepts have a high probability of accurately identifying a multimedia content element is received. The request contains a multimedia content element featuring a cookie and a book as well as a concept representing cookies. A signature is generated for the multimedia content element of the cookie and for the book. The generated signatures are compared to the cluster of signatures of the cookie concept. Based on this comparison, it is determined that the cookie concept identifies the cookie multimedia content element and that the book multimedia content element is unidentified. A signature of the book multimedia content element is compared to a signature cluster of the cookie concept, thereby yielding a matching score of 1%, which is below a proximity threshold of 15%. As a result, the book is identified as unknown.

FIGS. 4 and 5 illustrate the generation of signatures for the multimedia content items by the SGS 140 according to one embodiment. An exemplary high-level description of the process for large scale matching is depicted in FIG. 4. In this example, the matching is conducted based on video content.

Video content segments 2 from a Master database (DB) 6 and a Target DB 1 are processed in parallel by a large number of independent computational cores 3 that constitute an architecture for generating the Signatures (hereinafter the “Architecture”). Further details on the generation of computational cores are provided below. The independent cores 3 generate a database of Robust Signatures and Signatures 4 for Target content-segments 5 and a database of Robust Signatures and Signatures 7 for Master content-segments 8. An exemplary and non-limiting process of signature generation for an audio component is shown in detail in FIG. 4. Finally, Target Robust Signatures and/or Signatures are effectively matched, by a matching algorithm 9, to Master Robust Signatures and/or Signatures database to find all matches between the two databases.

To demonstrate an example of the signature generation process, it is assumed, merely for the sake of simplicity and without limitation on the generality of the disclosed embodiments, that the signatures are based on a single frame, leading to certain simplification of the computational cores generation. The Matching System is extensible for signatures generation capturing dynamics in-between the frames.

The Signatures' generation process is now described with reference to FIG. 5. The first step in the process of signatures generation from a given speech-segment is to breakdown the speech-segment to K patches 14 of random length P and random position within the speech segment 12. The breakdown is performed by the patch generator component 21. The value of the number of patches K, random length P, and random position parameters is determined based on optimization, considering the tradeoff between accuracy rate and the number of fast matches required in the flow process of the server 130 and SGS 140. Thereafter, all the K patches are injected in parallel into all computational cores 3 to generate K response vectors 22, which are fed into a signature generator system 23 to produce a database of Robust Signatures and Signatures 4.

In order to generate Robust Signatures, i.e., Signatures that are robust to additive noise L (where L is an integer equal to or greater than 1) by the computational cores 3 a frame ‘i’ is injected into all the cores 3. Then, cores 3 generate two binary response vectors: S, which is a Signature vector, and RS which is a Robust Signature vector.

For generation of signatures robust to additive noise, such as White-Gaussian-Noise, scratch, etc., but not robust to distortions, such as crop, shift and rotation, etc., a core Ci={ni} (1≤i≤s L) may consist of a single leaky integrate-to-threshold unit (LTU) node or more nodes. The node ni equations are:

$V_{i} = {\sum\limits_{j}{w_{ij}k_{j}}}$ n_(i) = ∏(Vi − Th_(x))

where, Π is a Heaviside step function; w_(ij) is a coupling node unit (CNU) between node i and image component j (for example, grayscale value of a certain pixel j); k_(j) is an image component ‘j’ (for example, grayscale value of a certain pixel j); Th_(x) is a constant Threshold value, where ‘x’ is ‘S’ for Signature and ‘RS’ for Robust Signature; and Vi is a Coupling Node Value.

The Threshold values Th_(x) are set differently for Signature generation than for Robust Signature generation. For example, for a certain distribution of Vi values (for the set of nodes), the thresholds for Signature (Ths) and Robust Signature (ThRs) are set apart, after optimization, according to at least one or more of the following criteria:

1: For: V_(i)>Th_(RS)

1−p(V>Th _(s))−1−(1−ε)^(l)«1

-   i.e., given that l nodes (cores) constitute a Robust Signature of a     certain image I, the probability that not all of these I nodes will     belong to the Signature of same, but noisy image, Ĩ is sufficiently     low (according to a system's specified accuracy).

2: p(V_(i)>Th_(RS))≈l/L

-   i.e., approximately l out of the total L nodes can be found to     generate a Robust Signature according to the above definition.

3: Both Robust Signature and Signature are generated for certain frame i.

It should be understood that the generation of a signature is unidirectional, and typically yields lossless compression, where the characteristics of the compressed data are maintained but the uncompressed data cannot be reconstructed. Therefore, a signature can be used for the purpose of comparison to another signature without the need for comparison to the original data. The detailed description of the Signature generation can be found in U.S. Pat. Nos. 8,326,775 and 8,312,031, assigned to common assignee, which are hereby incorporated by reference for all the useful information they contain.

A computational core generation is a process of definition, selection, and tuning of the parameters of the cores for a certain realization in a specific system and application. The process is based on several design considerations, such as:

-   -   (a) The cores should be designed so as to obtain maximal         independence, i.e., the projection from a signal space should         generate a maximal pair-wise distance between any two cores'         projections into a high-dimensional space.     -   (b) The cores should be optimally designed for the type of         signals, i.e., the cores should be maximally sensitive to the         spatio-temporal structure of the injected signal, for example,         and in particular, sensitive to local correlations in time and         space. Thus, in some cases, a core represents a dynamic system,         such as in state space, phase space, edge of chaos, etc., which         is uniquely used herein to exploit its maximal computational         power.     -   (c) The cores should be optimally designed with regard to         invariance to a set of signal distortions, of interest in         relevant applications.

A detailed description of the computational core generation and the process for configuring such cores is discussed in more detail in U.S. patent application Ser. No. 12/084,150 referenced above.

The various embodiments disclosed herein can be implemented as hardware, firmware, software, or any combination thereof. Moreover, the software is preferably implemented as an application program tangibly embodied on a program storage unit or computer readable medium consisting of parts, or of certain devices and/or a combination of devices. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units (“CPUs”), a memory, and input/output interfaces. The computer platform may also include an operating system and microinstruction code. The various processes and functions described herein may be either part of the microinstruction code or part of the application program, or any combination thereof, which may be executed by a CPU, whether or not such a computer or processor is explicitly shown. In addition, various other peripheral units may be connected to the computer platform such as an additional data storage unit and a printing unit. Furthermore, a non-transitory computer readable medium is any computer readable medium except for a transitory propagating signal.

All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the principles of the disclosed embodiment and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Moreover, all statements herein reciting principles, aspects, and embodiments of the disclosed embodiments, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure. 

We claim:
 1. A computerized method for identifying multimedia content elements, comprising: generating a signature for an unknown multimedia content element (MMCE), wherein the unknown MMCE is a portion of a multimedia content item; comparing the signature for the unknown MMCE to signatures of at least one concept to determine whether each of the at least one concept is proximate to the unknown MMCE, wherein each of the at least one concept is a collection of signatures representing a plurality of multimedia elements and metadata describing the at least one concept; for each proximate concept according to the comparing, determining a probability that the proximate concept identifies the unknown MMCE; and identifying the unknown MMCE based on determined probabilities of the at least one proximate concept, wherein the determined probabilities are in accordance with the determining.
 2. The method of claim 1, further comprising: determining whether a highest probability of the determined probabilities is above an identification threshold; and upon determining that the highest probability is above the identification threshold, identifying the unknown MMCE as known.
 3. The method of claim 1, wherein identifying the unknown MMCE further comprises: determining a plurality of proximate concepts having the highest probabilities among the at least one proximate concept, wherein the unknown MMCE is identified as potentially being related to each of the at least one proximate concept having the highest probabilities.
 4. The method of claim 3, wherein each of the at least one proximate concept is determined respective of other known MMCEs included in the multimedia content item.
 5. The method of claim 1, wherein the comparing yields a matching score, wherein the probability is equal to the matching score.
 6. The method of claim 1, further comprising: generating a new concept using an identified MMCE according to the identifying.
 7. The method of claim 1, wherein the multimedia content item is any of: an image, a graphic, a video stream, a video clip, a video frame, an audio clip, or a photograph.
 8. A non-transitory computer readable medium having stored thereon instructions for causing at least one processing unit to execute method steps comprising: generating a signature for an unknown multimedia content element (MMCE), wherein the unknown MMCE is a portion of a multimedia content item; comparing the signature for the unknown MMCE to signatures of at least one concept to determine whether each of the at least one concepts is proximate to the unknown MMCE, wherein each of the at least one concept is a collection of signatures representing a plurality of multimedia elements and metadata describing the at least one concept; for each proximate concept according to the comparing, determining a probability that the proximate concept identifies the unknown MMCE; and identifying the unknown MMCE based on determined probabilities of the at least one proximate concept, wherein the determined probabilities are in accordance with the determining.
 9. (canceled)
 10. The method of claim 9, wherein identifying the unknown MMCE further comprises: determining whether a highest probability among the determined probabilities is below a predefined identification threshold; and upon determining that the highest probability is below the predefined identification threshold, identifying the unknown MMCE as unknown.
 11. The method of claim 9, wherein identifying the unknown MMCE further comprises: determining a plurality of proximate concepts having the highest probabilities among the at least one proximate concept, wherein the unknown MMCE is identified as potentially being related to each of the plurality of proximate concepts.
 12. The method of claim 9, wherein each probability is determined to be equal to a proportion between a: number of signatures of the proximate concept that match the signature of the unknown MMCE and a total number of signatures in the proximate concept.
 13. The method of claim 9, wherein the multimedia content item is any of: image, a graphic, a video stream, a video clip, a video frame, or a photograph.
 14. (canceled)
 15. A system for identifying multimedia content elements, comprising: a processing unit; and a memory connected to the processing unit, the memory containing instructions that when executed by the processing unit, configure the system to: generate a signature for an unknown multimedia content element (MMCE), wherein the unknown MMCE is a portion of a multimedia content item; perform a comparison of the signature of the unknown MMCE to signatures of at least one concept to determine whether each of the at least one concepts is proximate to the unknown MMCE, wherein the each of the least one concept is a collection of signatures representing a plurality of multimedia elements and metadata describing the at least one concept; for each proximate concept according to the comparison, determine a probability that the at least one proximate concept identifies the unknown MMCE; and identify the unknown MMCE based on determined probabilities of the at least one proximate concept, wherein the determined probabilities are in accordance with the probability for each proximate concept.
 16. The system of claim 15, further configured to: determine whether a highest probability of the determined probabilities is above an identification threshold; and upon determining that the highest probability is above the identification threshold, identify the unknown MMCE as known.
 17. The system of claim 15, wherein identifying the unknown MMCE further comprises: determining a plurality of proximate concepts having the highest probabilities among the at least one proximate concept, wherein the unknown MMCE is identified as potentially being related to each of the at least one proximate concept.
 18. The system of claim 17, wherein each of the at least one proximate concept is determined respective of other known MMCEs included in the multimedia content item.
 19. The system of claim 15, wherein the comparison yields a matching score, wherein the probability is equal to the matching score.
 20. The system of claim 15, wherein an identified MMCE is utilized to create a new concept, wherein the identified MMCE is identified based on the determined probabilities of the at least one proximate concept.
 21. The system of claim 15, wherein the multimedia content item is any of: an image, a graphic, a video stream, a video clip, a video frame, an audio clip, or a photograph. 22-31. (canceled) 